73 research outputs found

    DS 680-001: Natural Learning Processing

    Get PDF

    Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection

    Full text link
    In this paper, we introduce DSN (Deep Serial Number), a new watermarking approach that can prevent the stolen model from being deployed by unauthorized parties. Recently, watermarking in DNNs has emerged as a new research direction for owners to claim ownership of DNN models. However, the verification schemes of existing watermarking approaches are vulnerable to various watermark attacks. Different from existing work that embeds identification information into DNNs, we explore a new DNN Intellectual Property Protection mechanism that can prevent adversaries from deploying the stolen deep neural networks. Motivated by the success of serial number in protecting conventional software IP, we introduce the first attempt to embed a serial number into DNNs. Specifically, the proposed DSN is implemented in the knowledge distillation framework, where a private teacher DNN is first trained, then its knowledge is distilled and transferred to a series of customized student DNNs. During the distillation process, each customer DNN is augmented with a unique serial number, i.e., an encrypted 0/1 bit trigger pattern. Customer DNN works properly only when a potential customer enters the valid serial number. The embedded serial number could be used as a strong watermark for ownership verification. Experiments on various applications indicate that DSN is effective in terms of preventing unauthorized application while not sacrificing the original DNN performance. The experimental analysis further shows that DSN is resistant to different categories of attacks

    Deep Neural Networks Explainability: Algorithms and Applications

    Get PDF
    Deep neural networks (DNNs) are progressing at an astounding rate, and these models have a wide range of real-world applications, such as movie recommendations of Netflix, neural machine translation of Google, speech recognition of Amazon Alexa. Despite the successes, DNNs have their own limitations and drawbacks. The most significant one is the lack of transparency behind their behaviors, which leaves users with little understanding of how particular decisions are made by these models. Consider, for instance, an advanced self-driving car equipped with various DNN algorithms doesn't brake or decelerate when confronting a stopped firetruck. This unexpected behavior may frustrate and confuse users, making them wonder why. Even worse, the wrong decisions could cause severe consequences if the car is driving at highway speeds and might finally crash the firetruck. The concerns about the black-box nature of complex deep neural network models have hampered their further applications in our society, especially in those critical decision-making domains like self-driving cars. In this dissertation, we investigate the following three research questions: How can we provide explanations for pre-trained DNN models so as to provide insights into their decision making process? How can we make use of explanations to enhance the generalization ability of DNN models? And how can we employ explanations to promote the fairness of DNN models? To address the first research question, we explore the explainability of two standard DNN architectures, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation for CNN models. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of CNN models. By further interacting with the neuron of the target category at the output layer of the CNN, we enforce the interpretation result to be class-discriminative. Besides, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into the additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results over a series of image and text classification benchmarks demonstrate the faithfulness and interpretability of the proposed two explanation methods. To address the second research question, we make use of explainability as a debugging tool to examine the vulnerability and failure reasons of DNNs, which further lead to insights that can be used to enhance the generalization ability of DNN models. We propose CREX, which encourages DNN models to focus more on evidence that actually matters for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Besides, recent studies indicate that BERT-based natural language understanding models are prone to rely on shortcut features for prediction. Explainability based observations are employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LTGR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental analysis over several text benchmark datasets validate that our CREX and LTGR framework could effectively increase the generalization ability of DNN models. In terms of the third research question, explainability based analysis indicates that DNN models trained with standard cross entropy loss tend to capture the spurious correlation between fairness sensitive information in encoder representations with specific class labels. We propose a new mitigation technique, namely RNF, that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance

    A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier

    Full text link
    While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier
    • …
    corecore